A helicoidal transfer matrix model for inhomogeneous DNA melting 
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An inhomogeneous helicoidal nearest-neighbor model with continuous degrees of freedom is shown 
to predict the same DNA melting properties as traditional long-range Ising models, for free DNA 
molecules in solution, as well as superhelically stressed DNA with a fixed linking number constraint. 
Without loss of accuracy, the continuous degrees of freedom can be discretized using a minimal 
number of discretization points, yielding an effective transfer matrix model of modest dimension 
(d = 36). The resulting algorithms to compute DNA melting profiles are both simple and efficient. 

PACS numbers: 87.15.Aa, 87.14.Gg, 05.20.-y 



I. INTRODUCTION 

The computation of the thermal stability and statisti- 
cal physics of nucleic acids is a classical problem going 
back to the 1960's. The standard model to describe the 
untwisting and separation of both strands of a free DNA 
double-helix in solution is the Poland-Scheraga helix-coil 
model, where each base pair can be in two possible states, 
helix (closed) or coil (open) P, Q • Addition of entropy 
weights to a basic Ising model, counting the number of 
possible configurations of open loops, induces an effec- 
tive long range interaction between base pairs which is 
essential for correctly obtaining the helix specific opening 
probabilities. The most widely used algorithm for com- 
puting the opening probabilities is the recursion relation 
method of Poland Incorporating the Fixman-Freire 
approximation |3| for the loop entropy factor reduces the 
computational complexity from 0{N'^) to 0{N) in the se- 
quence length N. With the availability of fully sequenced 
genomes, the study of DNA melting or denaturation has 
become an active field of research again, with recent re- 
sults relating the physics of denaturation to the biology of 
genomes Jp, la] , reparameterizing the original loop entropy 
weights [T'I, speeding up the Poland-Fixman-Freire algo- 
rithm for whole genome sequences 8J, and generalizing 
the model to describe hybridization with mismatches of 
unequal length sequences 9] . The traditional physics ap- 
proach to compute statistical mechanical probabilities by 
transfer matrix multiplication 0, has also recently 
been revisited by Poland (l^l- While this last algorithm 
offers no improvement in computational complexity (us- 
ing matrix sparsity it is 0{N'^)), it is very simple and 
straightforward to implement. 

In vivo DNA strand separation involves interactions 
with other molecules which impose superhelical stresses 
on the DNA molecule. This is modeled by Benham's sta- 
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tistical mechanical model for stress induced duplex desta- 
bihzation (SIDD) [3, which also is a helix-coil model 
with Ising degrees of freedom. It has a long range base 
pair interaction arising through superhelical constraints 
(no loop entropy factors are added), and opening prob- 
abilities are known to correlate very well with regions 
important for transcriptional regulation |l4l . An ex- 
act solution of the model is 0(7V^) but an accelerated 
algorithm using an energy cut-off reduces this to 0{N), 
such that SIDD properties can be computed for whole 
genome sequences as well 0, 0| . 

In parallel with the helix-coil models, a distinct class of 
statistical mechanical models for DNA melting has been 
developed starting from a physically more realistic de- 
scription of a base pair as an entity which has a contin- 
uum of intermediate states in between helix or coil. These 
models are all based on the Peyrard-Bishop model 
which consists of a nonlinear particle lattice with one real 
degree of freedom per base pair describing the stretching 
of the hydrogen bonds between the bases. Nonlinearity 
and cooperativety in such a model arises already with 
a nearest-i ieig hbor interaction, no long-range interaction 
is needed Subsequent improvements to the model 

include replacing the harmonic by an anharmonic stack- 
ing energy [20| , and introducing an additional angular 
degree of freedom per base pair to model the helicoidal 
structure of DNA [Hlllill. In the latter model, sepa- 
ration of the two strands is coupled to untwisting of the 
double helix. The effect of sequence inhomogeneity on 
the melting transition in the Peyrard-Bishop model with 
harmonic and anharmonic stacking has been studied for 
random sequences p3 | and for periodic sequences |25j . 

Recent experimental developments (see [2^ for a re- 
view) have made it possible to manipulate single poly- 
meric molecules directly and thus offer access to a whole 
new range of DNA properties other than the melting phe- 
nomenon. These elasticity experiments too can be ac- 
curately modeled by yet another type of statistical me- 
chanical models consisting of a double-helix with non- 
opening base pairs connected by flexible, folding back- 
bones [2^ |2^ . In this paper however, we will be con- 
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cerned with the melting transition only and the connec- 
tion between the continuous particle-lattice models and 
the discrete helix-coil models. 

Unlike the helix-coil models, which have seen many ap- 
plications to real biological sequences, the particle-lattice 
models are mostly used to obtain a more fundamental, se- 
quence independent, physical understanding of the DNA 
melting phenomenon, such as the order of the phase tran- 
sition, the existence of nonlinear 'bubble' excitations, 
etc. (see 19j for a recent review paper). Moreover, al- 
though both types of models have been validated against 
(different) experiments, very little is known about how 
they relate to one another and whether they are in some 
sense equivalent. Here, we attempt to close the gap be- 
tween both kinds of models. We study an inhomogeneous 
particle-lattice model based on the Barbi-Cocco-Peyrard 
helicoidal model and compute its melting properties 
for some standard example sequences both under free 
conditions and with super helical stresses. 

The thermally induced melting of free DNA is obtained 
as the formally very simple transfer integral equilibrium 
solution of the helicoidal model, yet computation of the 
melting properties is a challenge in itself as, e.g., a com- 
putation of the partition function involves 0{N) numeri- 
cal integrations over an infinite integration domain. How- 
ever, Zhang et al. jl^ alread y o bserved that for the 
Dauxois-Peyrard-Bishop model [23| , the numerical inte- 
grations can be carried out using a very limited number 
of discretization points: a dimension as small as d = 70 
gave very accurate results compared to much higher di- 
mensions {d = 800 and more), and by allowing an error 
of order 10~^ with respect to the exact results, the di- 
mension could be further reduced to d « 40. For the 
helicoidal model, we have found a value of d = 36 to 
be the minimal discretization dimension. This effectively 
reduces the particle-lattice model to a nearest-neighbor 
generalized Ising model, offering the possibility to de- 
velop a very simple and very fast algorithm to compute 
DNA melting probabilities. We propose such an algo- 
rithm which moreover is numerically stable for arbitrary 
sequence lengths, avoiding underflow problems related to 
the extensivity of the free energy (i.e., the exponential 
vanishing of the partition function for diverging sequence 
length). The algorithm is as simple as Poland's matrix 
algorithm ^^'^ f^^st as any of the fastest helix-coil 
algorithms discussed above. Extension of the algorithm 
to compute correlations between different base pairs, loop 
opening probabilities, higher order moments for base pair 
opening, etc. is trivial and straightforward. 

Stress induced DNA melting is modeled by imposing 
a fixed linking number constraint on the DNA strands, 
which leads to a coupling of all angular degrees of free- 
dom in the model. However, the linking number is ther- 
modynamically conjugated to an external torque variable 
applied at both ends of the molecule. The model with 
external torque can be solved by the above transfer ma- 
trix algorithm, and although there is no equivalence of 
ensembles, the fixed linking number solution can be ob- 



tained by a complex integration over the torque variable. 
The numerical solution is 0{M N) where M is a constant 
independent of N determined by the desired accuracy of 
the torque integration, a situation similar to the analy- 
sis of stress induced DNA melting using Benham's SIDD 
model [13. 

II. THE MODEL AND ITS EQUILIBRIUM 
SOLUTION 

We consider the helicoidal model introduced by Barbi, 
Cocco and Peyrard 21,J . Unlike the original homogeneous 
model, the various energy parameters will be explicitly 
sequence dependent. A DNA sequence is a string of N 
letters {A,C,G,T}, which for convenience we translate 
(alphabetically) into a numerical sequence (s„)„ taking 
values in {1,2,3,4}. Each base pair in the model has 
two degrees of freedom, a radial variable r, related to the 
opening of the hydrogen bonds, and an angular variable 
(j), related to the twisting of the base pair and responsi- 
ble for the 3-dimensional structure of the DNA molecule. 
Successive angles are restricted to ipn+i — 4>n ^ [0, tt] 
to enforce helical geometry. Alternatively, we can as- 
sociate a radial variable r to the sites of the lattice, and 
an angular variable G [0, tt] to the bonds of the lattice 

(6»„ = - 

The potential energy is given by 

N 

V = ^Ds^ (e-°=" - 1)^ 

+ E Ks^,s^^Arn+i - r„)'e-"('^"+'-"+^-2'-") (1) 

n=l 

Ti— 1 n— 1 

The first term is the Morse potential modeling the hy- 
drogen bonds between the two nucleotides in a base pair 
[l8j| . the second term is the anharmonic stacking inter- 
action between successive base pairs l2^, the third term 
is a harmonic twist energy allowing fluctuations of the 
length £n,n+i between successive nucleotides on the same 
DNA strand [U (see also Appendix , and the last 
term, which can be written as —T{(f>]\[ — (/)i), is the ex- 
ternal torque or superhelical twist. F > overtwists 
the DNA-molecule, inhibiting denaturation of the two 
strands, F < causes undertwisting and enhances denat- 
uration p^ . 

We note here that the present model is only suitable 
for negative or small positive torque F. Indeed, unwind- 
ing leads to denaturation and this is well described by 
the potential energy , but severe overwinding leads to 
new DNA forms with exposed bases and the backbone 
winding at the center |2.6f| . Such transition can obviously 
not be part of the present model. 

A variety of boundary conditions (b.c.) can be con- 
sidered for the radial variable r, such as free b.c, fixed 
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b.c, or periodic b.c, with minor modifications to the nu- 
merical solution of the model. For the angular variable 
(j) we consider two distinct situations. The first is to set 
= and have no constraint on (/)jv, corresponding to 
free b.c. for the variables 6, and describing the situa- 
tion in some single molecule experiments |2fij . The sec- 
ond situation, modeling superhelical stresses, is to set a 
fixed linking number constraint 0jv — = ^" ~ cxN, 
a e [0, {N — 1)71 /N], which contains periodic b.c. in (j) as 
the special case a = 27rn/7V, n = 1,2,3... The torque 
r and the total twist thermodynamically con- 

jugated variables, yet as we are explicitly working with 
a finite-size system, there is no equivalence of ensembles 
and both situations lead to different melting properties. 
We will refer to the first situation as the 'torque ensemble ' 
and the second as the 'linking number ensemble '. 



The choice of the energetic parameters is a difficult 
one and unlike for the helix-coil models, no well es- 
tablished set of parameters exists, especially with re- 
spect to the base pair dependence of the different en- 
ergy terms. Morse potential constants for weakly bonded 
A— T vs. strongly bonded C — G base pairs have been de- 
termined by comparison of the Dauxois-Peyrard-Bishop 
model with denaturation experiments |29| | . For the other 
parameters, we follow the classification of El Hassan and 
Calladine |3Q(]. More precisely we take Ks,t inversely pro- 
portional to the slide variance of the step (s, t), and Eg^t 
inversely proportional to the twist variance |3(l . Table 
2]. To obtain explicit values, we first adapt the rela- 
tive strength of the energy parameters such that their 
order of magnitude agrees with the parameters used in 
[2^ . In that case we obtain the correct transition tem- 
perature interval, but a less perfect differential melting 
map (a melting map gives for each base pair the temper- 
ature at which it transforms from closed to open, see [sH ] 
and Section lill A(l . By increasing the relative strength of 
the twist energy, the transition interval is widened, but 
the melting map becomes exact. To compute opening 
probabilities and melting maps, and identify stable vs. 
unstable regions, this last set of parameters is more ade- 
quate. A more detailed comparison with experiment will 
be needed to find the parameters which best fit the phys- 
ical melting transition, but we do not pursue this further 
in this paper. All the explicit numerical values we use are 
given in Appendix ^ To conclude, we mention that in 
the torque ensemble, to first order, sequence specificity 
in the melting process comes from the base pair specific 
Morse potentials, but inhomogeneity in the stacking and 
twist energies has second order effects which are nonethe- 
less important for a detailed identification of the different 
melting domains. In the linking number ensemble, the 
coupling of all angular degrees of freedom leads to more 
complicated sequence specific melting behavior. 



A. Equilibrium solution in the torque ensemble 

Since we are not interested in velocity dependent quan- 
tities, the kinetic energy terms can be integrated directly 
in the partition function, which becomes, up to a multi- 
plicative constant and with free b.c, 

Z = y dri . . . J drjy J d9i . . . J d9N-iri...rN e~^^. 

(2) 

The 6'-integrals factorize, and 
Z= fdn... f dr^T(i)(ri,r2)...T(^-i)('^iV-i,^Jv), 



where for n — 1, . . . , N — 2, 



X 1'^ ^-f3V}"^>(r,r\'x)^l3riicos(x) 



1 



and 



X r '^^ p~ PV}'^ -^\ry .x)^fST a.cos(x) 

J-1 

Vm \ Vs"^\ and v/"'' denote respectively the Morse, 
stacking, and twist energy terms. As we will not need 
spectra of transfer integral operators, there is no need 
for symmetrizing these kernels. 

In order to compute expectation values of the form 
{f {r n) g {cos On)) for suitable test functions / and g, we 
need additional transfer integral operators 



dx 



1 \/l — x- 



-:9{x)e 



-/3Vj {r,r' ,x) pr acos(a:) 



with appropriate modifications for the right-most sites 
N -1 and N. 

Since strand separation and untwisting are directly 
correlated by the twist energy term, it is often sufficient 
to consider the case g = 1, such that we get the simpler 
kernels 



T'f''\r,r') = fir)T^^^Hr,r') 



Tf\r,r') = T<^'^-'\r,r')f{r'). 



(3) 
(4) 



To solve the model numerically, we replace the transfer 
integral operators by finite size transfer matrices. The 
most efficient way for doing this is approximating the 
integrals in the partition function by finite sums using 
Gaussian quadratures js^- For the angular a;-integrals, 
this is straightforward as they already contain the right 
weight function for Gauss-Chebyshev integration. For 
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the radial r-integrals, we first restrict the infinite integra- 
tion domain to a finite interval [a,b], then apply Gauss- 
Legendre integration. Let zj, j = 1, . . . , Mc, be the ze- 
ros of the Mc 'th Chebyshev polynomial, all having equal 
weight tt/Mc- Let z^, j — 1, . . . , Ml, be the zeros of the 
Mi'th Legendre polynomial, — ^{b — a)zj + ^{b + a) 
the zeros transformed to the interval [a, 6], and wj the 
associated weights 232j. 

We obtain the transfer matrix approximation to the 
partition function, 

z = 5](f . . . r(^-i^)^^. = (z;|f w . . . r(^-i)|i;). 



With these matrices, we can then compute, e.g., a pro- 
file n 1-^ {f{rn)g{cos9n)) by the above formulas. By the 
simplicity of the transfer matrix formalism, the computa- 
tional complexity of this procedure clearly increases only 
linearly with N. 

However, even for sequences of moderate length (a few 
kbp with double precision calculations), the left and right 
matrix products have such small entries, that they con- 
sist of round-off error only, and the computations become 
meaningless. This is a common problem and due to the 
extensivity of the free energy. To make this computation 
work for sequences of arbitrary length, we define normal- 
ized left and right vectors: 



where w = (1 1 . . . 1), |-) and (-j are the familiar Dirac 
column, resp. row vector notation, and T*^"^ are the Ml x 
Mi transfer matrices defined by 



(n) _ 



f{N-l) 
ij 



Mc 
IT X - 



(5i,4j,2:fc)g/3racos(zfc) 



Mc 
k—1 



(?i,5j,Zfc)g/3racos(2;fc) 



Likewise matrices T^j^^ are defined as finite approxima- 
tions to the corresponding kernels. 

Different boundary conditions in the radial variable 
can be easily accommodated by changing the vector v: 



for fixed b.c. 



^j, Vi — Sij, for closed, resp. 



open b.c, Vi — /(^j < 12), resp. Vi = /(^^ > 12), and for 
periodic b.c. the inner product {v\ ■ \v) is replaced by the 
trace Tr(-). Here we follow the convention that a base 
pair is 'open' if r — ro > 2A, and / denotes the indicator 
function, I{A) = 1 if the condition A is true. 
Defining left and right matrix products 



M^n) ^ yd) 



for 71=1, 



,7V- 1, and M 



(0) 



M 



{N) 



1, we obtain 



(v\M 



(n-l)^(n)^^(n+l)| 



'v\M 



iN-2)rp{N) 



f 



if {r n)g {cos 0n)) = 

ifirN)) = 



The different transfer matrices T^"^ for n = 1, . . . ,N — 
2 choose between 16 different matrices, one for each nu- 
cleotide step type. These matrices, together with one 
matrix T^-^~^'> for the final bond, are computed first 
and stored on disk. For a given sequence we then com- 
pute and store the left and right matrix products. For a 
given pair (/, g) we compute again first the 16 possible 
matrices T^^^ and the two matrices 2/^"^'' and Tj:^K 



[w 



in). 



,("-!) 



|ji(n) 



„(»)\ 



j'(n) 



W 



with w 



(0) 



,(«-!) 



and while inductively creat- 



ing these vectors we store 



„("+i)\ 



A short calculation reveals that 



{firn)g{cOS 0n)) 

ifirN)) 



, (Af-2)| (N-l), 
CN-l{w\, ' 



involving only normalized vectors, sequence length inde- 
pendent (/, (7)-matrices, and the constants c„, which are 
formed by sequence length independent matrices acting 
on normalized vectors. 

If g = 1, transfer matrices are of the form ©"Q, 
and the formulas are even simpler. Denote by D/ the 
multiplication operator with the function / and by Df 
its diagonal matrix discretization. We get 



ifirn)) 
ifirN)) 



/ (n-l)i (n)\ 
i^L \Wr) 



[W 



(W-2) 



(5) 
(6) 



This method can be easily extended to compute higher 
moments. For example, to compute if {tu) firm)) for 
fixed n and all w, we define f = f)"^ and f = f (') 
for I 7^ n. Writing (•)' to denote expectation with respect 
to these new transfer matrices, we have, for functions 
/>0, 



if{rn)f{rm)) = ifirn)) (/(r™))'. 



(7) 



The practical applicability of the method clearly re- 
lies on the grid size values and Mc, which were 
determined as follows. First we started from the value 
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Ml = 70, which according to Zhang et. al. gives ex- 
act results for the Peyrard-Bishop model. For this value, 
the upper limit of the integration domain has to be set 
to = 40, larger values of h require larger H^- The 
lower limit a can be put equal to 9.7 as the Morse po- 
tential can be considered infinite for smaller values. We 
determined a value Mc = 35 to give accurate results 
in comparison with the MELTSIM program jSj. Like 
in the Peyrard-Bishop model [i^, we then found that 
Ml could be further decreased with negligible error, to 
a value of 36. Further reducing the dimension leads to a 
dramatic change where suddenly all the interesting tran- 
sitional behavior is lost, the chain is either completely 
open, or completely closed. After Ml was minimized, 
we decreased Mc- Around Mc — 20, we loose again all 
interesting behavior, but the transition is less sharp in 
this case. We settled on Mc = 24. 

The computational method presented so far works well 
up to a certain sequence length, where memory becomes 
the bottleneck instead of CPU speed (around 10® bp 
on a typical PC). To treat even longer sequences, the 
sequence is divided into a number of smaller overlap- 
ping subsequences and the probability profiles of those 
are combined to obtain the full length profile. This is 
a standard procedure [l^ Il7j . which however is much 
simpler in a nearest neighbor model than in the long 
range helix-coil models. More precisely, assume we cut 
the sequence of N base pairs into N/Nq subsequences of 
length Nq. To correct for the artificial boundaries thus 
introduced, we compute the opening probabilities for an 
interval [INq — d,{l + l)No + d] but only keep the val- 
ues for the interval [INo, {I + l)A*'o]. If d is much larger 
than the typical correlation length, this gives the exact 
opening probability for the full sequence. In the helix- 
coil model, such a cut is never exact because d is always 
smaller than the interaction length. In Section IIII Al 
we will see that at typical values of T and F (i.e., val- 
ues differentiating between stable and unstable regions), 
the correlation length is typically rather short, a few 100 
base pairs at most. Therefore, a window size of length 
No = 10^ and overlap 2d between 10^ and 10^ leads to an 
exact algorithm for long sequences whose speed is only 
mildly affected by the windowing procedure. 



B. Equilibrium solution in the linking number 
ensemble 

The partition function in the linking number ensem- 
ble is again given by an integral of the form (|2Jl, but 
the angular integrals are now restricted to the subspace 
of [0,7r]^~^ for which the linking number or total twist 
satisfies 

I N-i 

n=l 



for some fixed a G [0, {N — 1)tt/N]. Very often, instead 
of a, the superhelical density a is specified, 

Lk — Lko 



where Lk — (27r) ^ 0„ is the linking number, and 

Lko — (27r)^^ J2n ^n ri+i the ground state, zero torque 
linking number. As we remarked in Section ^ our model 
is only suitable for negative or small positive superhelic- 
ity, and for definiteness we will consider in this section 
only negative torque F, corresponding to negative super- 
helicity a. 

The situation is completely analogous to the standard 
statistical mechanics situation of canonical and grand- 
canonical ensembles: a plays the role of the 'density', F 
the role of a 'chemical potential', and by changing in eq. 
Q the angular integration variables to 6*1, ... , 9n~2, A = 
X^ri ^" ' that the grand-canonical (torque ensem- 

ble) and canonical (linking number ensemble) partition 
functions are related by a Laplace transform 

Z,,(F) = dXe^^'Zi,{-), 

where it is understood that Zik{a) = for a > (N — 
1)tt/N. 

By standard inverse Laplace transform techniques, the 
linking number partition function can be obtained from 
the torque partition function by a contour integration in 
the complex plane 



Zikia) = P 


r^+'°^ dz 






= /3 


r^+'°° dz 


^-l3N[za+Ft,{z)] 


iT-ioo 27ri 





(8) 



where the integral is carried out on a line parallel to the 
imaginary axis, with F < 0, and Ftq{z) is the free energy 
in the torque ensemble. 

Standard statistical mechanics proceeds by choosing a 
line which crosses the real axis at a critical point of the 
harmonic function za + Ftq{z). This point is a saddle 
point and the contour is a path of steepest descent, such 
that for large /37V, the integrand in eq. (jS)) is significantly 
different from zero in a small interval near the real axis 
only. Since we are interested in inhomogeneous, finite 
sequences, we do not consider the question of equivalence 
of ensembles in the thermodynamic limit. 

More precisely, for a given cr < and corresponding a, 
the function 

attains a maximum at some value Fq < 0, namely the 
value Fo for which {Y,n^ri)tq,To = where {■)tq^To 

denotes expectation in the torque ensemble. The line 
passing through Fq is chosen as the integration contour. 
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and we can write 



2'K 



Because of the large parameter f3N, the function 

|g-/3JV(Ft,(ro+«c^)--Ft,(ro))| 

is tightly concentrated around lo — 0, and the integral 
can be restricted to a small interval [— e^Vjejv]- It is im- 
portant to remark that to apply the standard station- 
ary phase expansion, ejv would have to be much smaller 
than {PN)~^^'^, a condition which is typically not fulfilled 
here. An efficient method to numerically compute the re- 
maining integral consists of computing the integrand at 
a number of points and find a cubic splines interpolation 
which can be readily integrated. 

The algorithm to compute expectation values pro- 
ceeds as follows. Like in the previous section, let (/, g) 
be single-site test functions and denote by Z^") the 
partition functions obtained by substituting at position 
n the transfer matrix T^^^ for T^"\ Further denote 

by p[^\r) — {f {rn)g{cos 6n)) tqT the expectation value 
at torque T and analogously p[^\a). Recalling that 
pi;\T) - exp(-/3iV[F4")(r) - Ft,{T)]), we find 



/ duj I 



-if3Nauj„-/3N{Ft^{ro+iu))-Ftg{ro)) 



(10) 



Since the l.h.s. of this equation is obviously real, we 
take the real parts of the integrands before numerically 
computing the integral. The torque expectation values 
p|q ^ (r) are evaluated using the efficient algorithm of Sec- 
tion lll AI which can be easily extended to also return the 
free energy: 



-l3NFtgiT)^lii{v\T'^^K..T 



N~l 



n=l 



„("-!) I j.(n)L„(^')\ 
, (n-l), (A')x 



Hence the algorithm to compute expectation values for 
all n in (Uni is still 0{N), but M times slower than 
in the torque ensemble, where M only depends on the 
number of discretization points chosen to compute the 
w-integrals. 

To prove equivalence of ensembles between the torque 
and linking number ensemble (for the given test func- 
tions) in the thermodynamic limit for homogeneous se- 
quences, we would have to show that the fraction of the 



integrals in equation (|10|l tends to 1. For finite, inho- 
mogeneous sequences, the non-triviality of this fraction 
causes a nonlinear coupling between base pairs that will 
be illustrated in Section IlIIBI 



III. EXAMPLE RESULTS 

A. Torque ensemble 

For easy comparison with the Poland-Scheraga helix- 
coil model, we show example results for the PN/MCS13 
sequence {N = 4608) which is the main example of 'sij. 
This sequence is the pBR322 sequence ,36.] with an insert 
\AAGTTGAACAAAAR]irAAGTTGA at position 972 
I33II ([. . .]x means [. . .] x times repeated). The conclu- 
sions drawn here are equally valid for all other sequences 
we tested. 

In the Peyrard-Bishop and related models a base pair 
is said to be denatured when it is stretched more than 
2 A away from its equilibrium length of 10 A, hence the 
probability of denaturation is given by 



Pn = {h{rn - 12)), 



(11) 



where h{r) is the Heaviside function, h = 1 for r > 
and otherwise. Notice that we only need the simpler 
formulas 1(3)-® to compute melting profiles n i— > p„. 

Figure □ shows the melting profile for the PN/MCS13 
sequence at typical in vivo temperature T = 310 The 
torque value F = —0.042 eV/rad is chosen to give a good 
delineation of unstable regions (p„ « 1). Decreasing F 
increases the number of open base pairs, and increasing 
F has the opposite effect. On the basis of this melting 
profile we identify three unstable regions, the first one 
around position 1000 corresponding to the AT-rich in- 
sert in the pBR322 sequence, and the other two with 
maximums at position 3489 and at position 4423. 
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FIG. 1: PN/MCS13 Opening probability at T = 310 is' and 
r = -0.042 eV/rad. 
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For whole genome sequences of several million base 
pairs, the opening probability is computed by dividing 
the sequence in shorter overlapping subsequences (see the 
end of Section III A|) . To do this correctly, we need to 
know the correlation length, or more precisely the length 
at which the correlation between a base pair and the rest 
of the sequence vanishes. Hence for a fixed base pair n 
we compute, using |7J), 

= {rnTm) - (r„)(r™). 

Figure [3 shows the correlation function for two 
different values oin,n — 3289 in the middle of the second 
opened bubble (see Figure^), and n — 2200 in the largest 
closed region. Clearly, the correlation is much larger in 
the denatured region, but even here it does not extend 
beyond a few 100 base pairs. 
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FIG. 2: Correlation function for n = 3489 (top) and 
n = 2200 (bottom) for the PN/MCS13 sequence at T = 310 K 
and r = -0.042 eV/rad. 

Due to the inhomogeneity of base pair bonding and 
stacking energies, DNA melting is a stepwise process 
with different domains melting at different temperatures. 
This can be visualized by computing differential melt- 
ing curves and melting maps. Let 7 be the fraction of 
open base pairs, 7 = {J2nPn)/^- A differential melting 
curve is a plot of d'j/dT vs. temperature T. A melt- 
ing map is obtained by displaying for each temperature 
the base pairs which have opening probability greater 
than i (shaded area). Such a map gives another picture 
of thermodynamically stable (high melting temperature) 
vs. unstable (low melting temperature) regions in the 
particular sequence. 

Figure 01 shows the differential melting curve (obtained 
by differentiating a cubic splines interpolation of the com- 
puted values 7(r)) and melting map for the PN/MCS13 
sequence under the same condition F = —0.042 as in Fig- 
ure Handle We can clearly identify again the AT-rich 
inserted region around position 1000 which melts first, 



as well as the two unstable regions around positions 3500 
and 4500. 




295 300 305 310 315 320 325 330 335 340 
Temperature (K) 



FIG. 3: PN/MCS13 Differential melting curve and melt- 
ing map (shaded area) (temperature increment 0.5 K, T = 
-0.042 eV/rad). 

Comparison of the differential mel ting curve and melt- 
ing map with the MELTSIM result [sj Figure 3] shows 
first of all that the general shape of the melting curve 
is correct, but the temperature range from a completely 
closed to a completely denatured molecule is about twice 
as large in the helicoidal Peyrard-Bishop model with the 
current set of parameters. On the other hand, the melt- 
ing map as a map depicting the successive melting or- 
der of different regions is in precise agreement with the 
MELTSIM melting map. 

A more systematic determination of the physical value 
of the various energy parameters in the helicoidal model 
is desirable, but since the whole process of fitting model 
computations to experimental results of DNA denatura- 
tion in solution is quite subtle (the experimental results 
also depend on external conditions like, e.g., the solvent 
salt concentration 'flU^), it falls beyond the scope of this 
paper. It should also be pointed out that while such fit- 
ting was important in the early stages of theoretical study 
of DNA melting, present day problems concern more the 
identification of stable and unstable regions and linking 
those to genomic content. As long as the relative strength 
of the various energy terms is kept within certain limits, 
this identification is unaffected by changing the model 
parameters. 

In Figure ^ and |31 we illustrate some of the effects of 
changing the model parameters. Figure 0] shows the dif- 
ferential melting curve and melting map with homoge- 
neous stacking and twist energy terms, where the values 
oi K, E and 6*0 are the averages of the values given in 
Appendix The overall identification of stable vs. un- 
stable regions remains intact, but comparison with Fig- 
ure 13 shows that considerable detail in the melting map 
is lost, with larger regions melting at once. 
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Temperature (K) 

FIG. 4: PN/MCS13 Differential melting curve and melting 
map (shaded area) with homogeneous stacking and twist en- 
ergy {K = 0.1486 el/, E = 0.0942 61^, 6*0 = 34.81°) (tempera- 
ture increment 0.5 K, F = -0M2eV/rad). 

Figure [3 shows the effect of changing the relative 
strength of the stacking and twisting energy terms. 
Again they are taken homogeneous, but now with the 
original values of Barbi et al. . Although with these 
values the transition temperature interval is of the right 
magnitude, the differential melting curve and melting 
map clearly display insufficient detail. Most notably, the 
two distinct unstable regions around positions 3500 and 
4500 are merged into one large region. 
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FIG. 5: PN/MCS13 Differential melting curve and melting 
map (shaded area) with homogeneous stacking and twist en- 
ergy [K = 0.65 eV, E = O.O-ieV, 60 = 34.78°) (temperature 
increment 0.5 if, F = -0.042 eV/rad). 

So far, we have shown results for a chosen value T = 
—0.042 for easy comparison between different figures, but 
other values can be considered as well. At fixed temper- 
ature, increasing T decreases the fraction of open base 



pairs, corresponding to an increase in the phase transi- 
tion temperature in the thermodynamic limit . What 
is perhaps more interesting is the fact that increasing F 
also smooths the differential melting curve (see FigureEl , 
and broadens the transition; decreasing F has the oppo- 
site effect. Heuristically, increasing F effectively increases 
the stiffness of the double stranded DNA, which is indeed 
known to broaden the transition 0, 13 ■ Tfie value F = 
plays no special role in this respect. In contrast, a re- 
cent model which adds angular degrees of freedom 
to the Poland-Scheraga helix-coil model singles out the 
value F = as special and predicts a broadening of the 
transition for F < as well as F > 0. 
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FIG. 6: PN/MCS13 Differential melting curves for F = -0.05, 
0.0, and 0.05 eV/rad (left to right, temperature increment 
0.5 K). 

Finally, we have also tested the performance of the 
transfer matrix algorithm on larger sequences, up to 
about « 3 X 10^. The algorithm was written in Matlab 
and run on a 2.8 GHz PC, and computed times follow the 
line t = 10~* A^ -I- 0.40, with t in seconds. Comparison 
with shows that our algorithm performs as fast as the 
fastest available helix-coil algorithm. 



B. Linking number ensemble 

In this section, we illustrate the solution of the fixed 
linking number ensemble, and compare it to the fixed 
torque ensemble as well as the helix-coil SIDD model 
[isj l , by showing example results for the C-MYC sequence 
(A^ = 3200), available as Example 3 on the WebSIDD 
server [T^ . Again, the qualitative conclusions drawn 
from this example are valid in general. Following the 
outline of Section Hi Bl we start by showing in Figure [7| a 
plot of Ftq (F) + aF for different values of the superhelical 
density cr = -0.06, -0.045, -0.03, -0.015, correspond- 
ing to values a = (1 + a)LkQ/N = 0.572, 0.581, 0.590, 
0.600. As a goes to 0, the graph becomes constant for F 
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smaller than a critical value corresponding to the torque 
induced melting transition observed in the homogeneous 
model j2^l2^ . For non-zero a the graph has a maximum 
at some value Tq(<7) and this is the value we need for 
constructing the integration contour and for comparing 
the linking number and torque ensembles. We emphasize 
here that obtaining a very precise value of the location of 
the maximum is not necessary. Indeed, the integration 
in eq. ^ and H10|l can be carried out along any line par- 
allel to the imaginary axis. Taking a line at or close to 
the maximum will simply ensure that the function to be 
integrated falls off very quickly along this line. 

0.1 25 1 1 1 1 1 1 1 1 1 1 1 




r (eV/rad) 



approximation of Figure and due to the broad maxi- 
mum, better accuracy should not be expected. However, 
it is clear from the fast decay of the function |u(a;)| in 
Figure |H1 that this value of Fq is accurate enough for 
computing expectation values in the linking number en- 
semble. If a more accurate value is desired, such as in 
Figure im below, we can start from this approximate Fq, 
and fine tune it by computing (X^n On)tq.v for some values 
F w Fq until the correct linking number is found. 




0.04 



0) 



FIG. 8: C-MYC Absolute value of u{lo) (eq. O) at cr = 
-0.03 (a = 0.59) and T ^ 2,10 K (cj-interval 5 ■ 10""). 



FIG. 7: C-MYC Ft,(r)-|-ar for a = 0.572, 0.581, 0.590, 0.600 
(top to bottom) at r = 310 K. 

Next we turn our attention to the integrand in cq. ^ , 

u{lj) = (.-l3N(Ft,(To+iu:)-Ft,(ro))^-ipNau> f^^^) 

Figure IHl shows the absolute value of u in a neighborhood 
of a; = for a superhelical density a — —0.03 and temper- 
ature T — 310 For this value of a and T, the critical 
point is given by Fq = —0.04149. As expected, the func- 
tion decays to rapidly, but clearly not rapidly enough 
to apply a stationary phase approximation {{PN)^^/"^ = 
0.003). Figure |5| shows the real part of u, which is the 
function to be integrated to obtain the partition function 
in cq. Both Figure|Hland|Hlare generated by interpo- 
lating between a number of computed data points. Due 
to the oscillations, it is important to compute enough 
data points, we used an interval of Aw = 5 • 10~^. One 
way to determine the accuracy of the numerical approx- 
imation is to check if the expectation value of the su- 
perhelical density matches the imposed value. We find 
((27r)-i Y.n{dn)ik,a " Lko)/Lko = -0.03005 which com- 
pares well with the exact value of —0.03. Similarly, we 
can check how well the value Fq was determined by com- 
puting the expected helical density in the fixed torque 
ensemble with torque Fq, and find a value of —0.03026. 
This is the value obtained by differentiating the splines 
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FIG. 9: C-MYC Real value of u{uj) (eq. ^) at a = -0.03 
(a = 0.59) and T ^ 310 K (cj-interval 5 • 10""). 

To visualize how the linking number constraint affects 
the melting behavior of a sequence by introducing an 
effective long range base pair coupling in the partition 
function, we follow Benham and Bi 17], and compare 
the original C-MYC sequence to a modified sequence 
which differs from the C-MYC sequence in a tiny frag- 
ment only. More precisely, we compute the opening prob- 
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ability for the C-MYC sequence at fixed linking number 
(ct — —0.03) fFigure [TUI top panel), then remove from 
the sequence a small 44 bp segment in the center of the 
main untwisted region (sequence positions 781 - 824), 
and compute the opening probability for this modified 
sequence at the same superhelical density (Figure 1101 
bottom panel). 

For the C-MYC sequence, we find that there are two 
locations that are preferentially opened, a first, large one, 
between positions 760 — 850, and a second, smaller one, 
between position 2900 — 2950, with much higher opening 
probability for the largest region. In agreement with the 
SIDD-model Tt^ we see that a small modification of the 
sequence is sufficient to shift the main opening activity 
to the second region. 



1000 1500 2000 

Sequence position 



1000 1500 2000 

Sequence position 

















1 


















n 





500 1000 1500 2000 2500 3000 

Sequence position' 

















- 







500 1000 1500 2000 2500 3000 

Sequence position 



FIG. 10: Opening probability at T = 310 AT and fixed su- 
perhelical density a = —0.03 for the C-MYC sequence (A'" — 
3200) (top) and the modified C-MYC sequence {N = 3156) 
(bottom). 

In Figure im we show the corresponding opening prob- 
abilities in the fixed torque ensemble. For a fair compar- 
ison, we adjusted the torque values for both sequences 
separately to return a superhelical density expectation 
value a = —0.03 with high accuracy. As a final check 
that we are comparing both ensembles with the right 
parameters, we compute the total fraction of open base 
pairs ^nPni s-iid fii^d that it is equal to 0.018 for 
both top panels of Figure [TUI and 1171 and equal to 0.017 
for both bottom panels. 

The main qualitative difference that can be observed 
between both ensembles is that the effect of removing 
the small segment is much more localized in the torque 
ensemble, and the first region is still dominant. If we 
consider a base pair to be open if p„ > 0.5, like in con- 
structing the melting maps in Section IlII Al we see that 
in the linking number ensemble the open region shifts 
from the left to the right upon modifying the sequence, 
while in the torque ensemble, the open region disappears. 



FIG. 11: Opening probability at T = 310 K in fixed torque en- 
semble for the C-MYC sequence {N = 3200, T = -0.041475, 
(CT)r = -0.03004) (top) and the modified C-MYC sequence 
{N = 3156, r = -0.044365, (cr)r = -0.03003) (bottom). 



IV. CONCLUSIONS AND OUTLOOK 

In this paper we have connected the particle-lattice 
helicoidal Peyrard-Bishop model to more familiar Ising- 
type models for inhomogeneous DNA melting. In the 
simplest setting of a fixed external torque, the model has 
the same melting behavior as the Poland-Scheraga helix- 
coil model. Since the numerical integrations needed to 
compute the profiles in the particle-lattice model can 
be carried out using a limited number of discretiza- 
tion points, and since the interactions are only nearest- 
neighbor, we have obtained a new method to compute 
melting profiles which is both very simple to implement 
and very efficient to execute, and which is therefore 
highly attractive to analyze very long, or even whole 
genome sequences. 

Furthermore we have shown that also the more compli- 
cated setting of a fixed linking number can be treated and 
that the results are in agreement with Benham's SIDD 
model. The algorithm is again simple to implement and 
consists of numerically integrating the fixed torque re- 
sults over a small range of complex torque values. Some 
of the points raised here such as the inequivalence of en- 
sembles are worthwhile of investigating mathematically 
more rigorous in the setting of the homogeneous heli- 
coidal Peyrard-Bishop model to see if they persist in the 
thermodynamic limit. 

The equivalence between nearest-neighbor lattice mod- 
els with continuous degrees of freedom on the one hand, 
and Ising models with loop entropy weights or long-range 
interaction on the other hand, raises more fundamental 
questions as well. A better understanding of this equiv- 
alence will presumably lead to a better understanding of 
nonlinear phenomena in one-dimensional systems. 
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where 



3(0) 



is the average helical twist angle of the 



given step , ta ken from the database of El Hassan and 
Calladine ^ 



APPENDIX A: ENERGY PARAMETERS 

In this appendix we collect the various parameters used 
in the potential energy Q). All lengths are measured in 
A, energies in eV, and angles in rad. 

For the depth Di of the Morse pot entials, we choose 
values close to the value 0.15 of [23, but taking into 
account that a C — G base pair has a 1.5 times stronger 
bond than an A — T base pair. For the widths Ui we take 
the values of Hal. 



Di 
ai 



-- 0.12 
4.2 



0.2 



D3 

as = 



= 0.18 
6.9. 



The equilibrium distance ro is equal to 10. 

The length £n,n+i between successive nucleotides on 
the same DNA strand in the twist energy term is given 
by 



1 = , //i2 + r2 + . 



n+l 



2r„r„+i cos 9^, 



where h = 3.4 is the fixed vertical distance between base 
pairs. The rest length i^^^^^i ^^'^P dependent and given 



= ^ X 
360 



/35.9 32.9 34.8 32.4N 

37.4 31.9 35.1 34.. 

37.8 37.4 31.9 32.9 

\30.6 37.8 37.4 35.9y 



The parameter E is taken inversely proportional to the 
twist angle standard deviations, taken from the same 
database ISOll: 



= 0.4 X 



/0.3030 0.2632 0.2083 0.3571N 
0.1053 0.2703 0.1887 0.2083 
0.2632 0.2500 0.2703 0.2632 

VO.1493 0.2632 0.1053 0.3030y 



Similarly, the stacking energy parameter K is taken in- 
versely proportional to the slide standard deviations of 



K ^O.lx 



/3.5714 1.4085 1.2195 2.0833\ 
0.8130 0.8547 0.9804 1.2195 | 
1.4493 1.1628 0.8547 1.4085 

VO.9174 1.4493 0.8130 3.5714/ 



The constant a in the exponential is put equal to 0.5 as 
in 01. 



[1] D. Poland and H. Scheraga, Theory of helix-coil transi- 
tions in biopolymers (Academic Press, New York, 1970). 

[2] R. Wartell and A. Benight, Phys. Rep. 126, 67 (1985). 

[3] D. Poland, Biopolymers 13, 1859 (1974). 

[4] M. Fixman and J. Freire, Biopolymers 16, 2693 (1977). 

[5] E. Yeramian, Gene 255, 139 (2000). 

[6] E. Carlon, M. Maiki, and R. Blossey, Phys. Rev. Lett. 
94, 17810 (2005). 

[7] R. Blossey and E. Carlon, Phys. Rev. E 68, 061911 
(2003). 

[8] E. T0stesen, F. Liu, T. Jenssen, and E. Hovig, Biopoly- 
mers 70, 364 (2003). 
[9] T. Garel and H. Orland, arXiv:q-bio/0402037 (2004). 
[10] H. Kramers and G. Wannier, Phys. Rev. 60, 252 (1941). 
[11] L. Onsager, Phys. Rev. 65, 117 (1944). 
[12] D. Poland, Biopolymers 73, 216 (2004). 
[13] R. Fye and C. Benham, Phys. Rev. E 59, 3408 (1999). 
[14] C. Benham, Proc. Natl. Acad. Sci. USA 90, 2999 (1993). 
[15] C. Benham, J. Mol. Biol. 255, 425 (1996). 
[16] C.-P. Bi and C. Benham, Bioinformatics 20, 1477 (2004). 

http : //www.genomecenter .ucdavis . edu/benham/ sidd/ 
[17] C. Benham and C.-P. Bi, J. Comp. Biol. 11, 519 (2004). 
[18] M. Peyrard and A. Bishop, Phys. Rev. Lett. 62, 2755 
(1989). 



[19] M. Peyrard, Nonlinearity 17, Rl (2004). 

[20] T. Dauxois, M. Peyrard, and A. Bishop, Phys. Rev. E 

47, R44 (1993). 
[21] M. Barbi, S. Cocco, and M. Peyrard, Phys. Lett. A 253, 

358 (1999). 

[22] S. Cocco and R. Monasson, Phys. Rev. Lett. 83, 5178 
(1999). 

[23] M. Barbi, S. Lepri, M. Peyrard, and N. Theodorakopou- 

los, Phys. Rev. E 68, 061909 (2003). 
[24] D. Cule and T. Hwa, Phys. Rev. Lett. 79, 2375 (1997). 
[25] Y. Zhang, W. Zheng, J. Liu, and Y. Chen, Phys. Rev. E 

56, 7100 (1997). 
[26] T. Strick, M.-N. Dessinges, G. Charvin, N. Dekker, J.-F. 

AUemand, D. Bensimon, and V. Croquette, Rep. Prog. 

Phys. 66, 1 (2003). 
[27] H. Zhou, Z. Yang, and Z. c. On- Yang, Phys. Rev. Lett. 

82, 4560 (1999). 
[28] H. Zhou, Z. Yang, and Z. c. Ou-Yang, Phys. Rev. E 62, 

1045 (2000). 

[29] A. Campa and A. Giansanti, Phys. Rev. E 58, 3585 
(1998). 

[30] M. El Hassan and C. Calladine, Phil. Trans. R. Soc. 

Lond. A 355, 43 (1997). 
[31] R. Blake, J.W.Bizarro, J. Blake, G. Day, S. Delcourt, 



12 



J. Knowlcs, and J. J. SantaLucia, Bioinformatics 15, 370 

(1999). 

[32] W. Press, S. Tcukolsky, W. Vcttorling, and B. Flannery, 
Numerical recipes in C (Cambridge University Press, 
1992). 

[33] R. Blake and S. Delcourt, Nucleic Acids Res. 26, 3323 
(1998). 



[34] E. Carlon, E. Orlandini, and A. Stella, Phys. Rev. Lett. 

88, 198101 (2002). 
[35] T. Garcl, H. Orland, and E. Yeramian, 

arXiv:q-bio/0407036 (2004). 
[36] GenBank/EMBL accession number J01749. 

http : //www . ebi . ac . uk/ embl/ 



